A fast algorithm for finding k-nearest neighbors with non-metric dissimilarity
نویسندگان
چکیده
Fast nearest neighbor (NN ) finding has been extensively studied. While some fast NN algorithms using metrics rely on the essential properties of metric spaces, the others using non-metric measures fail for large-size templates. However, in some applications with very large size templates, the best performance is achieved by NN methods based on the dissimilarity measures resulting in a special space where computations cannot be pruned by the algorithms based-on the triangular inequality. For such NN methods, the existing fast algorithms except condensing algorithms are not applicable. In this paper, a fast hierarchical search algorithm is proposed to find k-NNs using a non-metric measure in a binary feature space. Experiments with handwritten digit recognition show that the new algorithm reduces on average dissimilarity computations by more than 90% while losing the accuracy by less than 0:1%, with a 10% increase in memory.
منابع مشابه
Non-Euclidean or Non-metric Measures Can Be Informative
Statistical learning algorithms often rely on the Euclidean distance. In practice, non-Euclidean or non-metric dissimilarity measures may arise when contours, spectra or shapes are compared by edit distances or as a consequence of robust object matching [1,2]. It is an open issue whether such measures are advantageous for statistical learning or whether they should be constrained to obey the me...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملOn Not Making Dissimilarities Euclidean
Non-metric dissimilarity measures may arise in practice e.g. when objects represented by sensory measurements or by structural descriptions are compared. It is an open issue whether such non-metric measures should be corrected in some way to be metric or even Euclidean. The reason for such corrections is the fact that pairwise metric distances are interpreted in metric spaces, while Euclidean d...
متن کاملSome improvements on NN based classifiers in metric spaces
The nearest neighbour (NN) and k-nearest neighbour (k-NN) classification rules have been widely used in Pattern Recognition due to its simplicity and good behaviour. Exhaustive nearest neighbour search may become unpractical when facing large training sets, high dimensional data or expensive dissimilarity measures (distances). During the last years a lot of fast NN search algorithms have been d...
متن کاملNon-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کامل